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Abstract 

A recently proposed mutual information based algorithm for decomposing data into 
least dependent components (MILCA) is applied to spectral analysis, namely to 
blind recovery of concentrations and pure spectra from their linear mixtures. The al- 
gorithm is based on precise estimates of mutual information between measured spec- 
tra, which allows to assess and make use of actual statistical dependencies between 
them. We show that linear filtering performed by taking second derivatives effec- 
tively reduces the dependencies caused by overlapping spectral bands and, thereby, 
assists resolving pure spectra. In combination with second derivative preprocessing 
and alternating least squares postprocessing, MILCA shows decomposition perfor- 
mance comparable with or superior to specialized chemometrics algorithms. The 
results are illustrated on a number of simulated and experimental (infrared and 
Raman) mixture problems, including spectroscopy of complex biological materials. 

Key words: multivariate curve resolution, independent component analysis (ICA), 
mutual information, MILCA, nonnegativity 



1 Introduction 



The problem of estimating parameters (concentrations and pure spectra) of 
a linear mixture model underlying a set of spectroscopic measurements has 
spurred growth of more than 20 algorithms [1] that now form an arsenal known 
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as multivariate self-modeling curve resolution tools (for recent reviews see 
[2,3,4,5,6,7.8]). Although advanced to the point that they have been imple- 
mented as commercial software (e.g., [9,10,11,12]), these techniques still leave 
room for new developments [13,14,15]. 

In their pioneering work, Lawton and Sylvestre [16] proposed decomposi- 
tion of mixed spectral signals into independent pure components. In practice, 
this will not be feasible, since different chemical species do not necessarily 
have completely independent spectra (e.g., because they may contain iden- 
tical or similar functional groups). Thus, any residual statistical dependen- 
cies in the recovered sources might either signal a faihirc of the method, or 
reflect the fact that the goal of achieving independent spectra was inconsis- 
tent. Aside from that, there have been only relatively few separate attempts 
[17,18,19,20,21,22,23,24,25,26,27,28,29,30,31] to use statistical (in) dependence 
of recovered sources (measured, e.g., by mutual information [32]) as a crite- 
rion in multivariate curve resolution. However, a more systematic study of 
applicability of these methods in spectral analysis is due. 

Viewed in a broader context, the use of these statistical dependencies consti- 
tutes the basis for the rapidly growing field of Independent Component Analy- 
sis (ICA) or, more generally. Blind Source Separation (BSS) [33,34,35,36,37,38,39,40,41]. 
One of the characteristic features of multivariate spectral curve resolution is 
that in most spectroscopic techniques the instrumental signals are nonncga- 
tive. In ICA or BSS, sources are not generally assumed to be of definite sign, 
although there are a number of papers in the BSS literature [42,43,44,45,46,47] 
that deal specifically with nonnegative sources. Clearly, such methods capa- 
ble of efficient decomposing nonnegative mixtures can be of potential use in 
analytical practice. 

The ICA approach rests on the underlying assumption of statistical indepen- 
dence of original (pure) sources [38,39]. In such a case pure signals can be 
recovered by finding a demixing transformation that minimizes dependencies 
between estimates. If the original independent sources and concentrations are 
positive, then also the estimates should be so [45] and nonnegativity can be 
used as a test of decomposition. Alternatively, nonnegativity may serve as a 
target property in optimization, together with other properties such as max- 
imal independence (see, for example, the Nonnegative PCA algorithm [46] 
and several chemometrics techniques that use the nonnegativity constraint 
[1,11,14,15,16,48,49]). 

As we already pointed out, in realistic problems the original spectral signals 
are not perfectly independent ("overlapping bands"). These dependencies orig- 
inate from similarities in the chemical structure, e.g., due to the presence of 
similar functional groups. Such mixtures are typically hard to decompose. This 
raises difficulties in straightforward use of general purpose ICA algorithms [49] , 
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which can only be studied by detailed simulations. Here we present a compar- 
ative study based on extensive statistics and show how the ICA methodology 
applies to realistic chemometrics problems. 

In spectral analysis, the overlap problem has motivated the development of 
specialized curve resolution methods. For instance, one group of algorithms is 
based on the idea of "pure variables" [50]. A "pure variable" is a wavelength 
at which only one of the components contributes. Isolation of pure variables 
for all mixture components is then a key to decomposition. This way to avoid 
the problem of overlapping bands was successfully implemented in several 
algorithms, e.g., KSFA [51], SIMPLISMA [9], IPCA [52] and, more recently, 
SMAC [53]). Also, Band-Target Entropy Minimization (BTEM [15,54]) has 
been proposed which involves an explicit (made by visual inspection [15]) 
choice of spectral features (target regions) to be retained in the course of 
constrained optimization. While efficient and highly flexible, supervised band 
selection has the drawbacks of being neither fully automated nor completely 
blind [2]. 

Since ICA is in most applications confronted with signals that cannot be 
linearly decomposed into independent sources, a more proper name would 
in such cases be the "Least dependent Component Analysis". Analyzing de- 
pendencies between reconstructed sources should then be an integral part of 
the method. A technical problem for such an analysis has been the need for 
fast, robust, precise and bias free estimators for mutual information (MI). 
In a recent paper we employed a novel MI estimator with notably improved 
properties [55] in what we called the Mutual Information based Least depen- 
dent Component Analysis (MILCA [56]). MILCA combines several features: 
(i) high performance ICA algorithm; (ii) output reliability tests; (iii) clus- 
ter analysis of reconstructed sources for residual interdependencies; and (iv) 
joining highly dependent sources into multi-dimensional sources. It has been 
extensively tested and compared to other ICA algorithms on various blind sep- 
aration problems. MILCA was found to outperform existing algorithms based 
on cruder approximations of mutual information (or other measures of inde- 
pendence) [56]. Some of those, e.g., FastICA [17,18,19,20,21,22,23,24,25,26,35], 
JADE [20,27,28,40], Infomax [29,41], SOBI [31,36] had been employed in the 
earlier applications of ICA to spectral curve resolution. These results suggest 
that further progress in developing high-performance ICA methods may well 
determine their place in practical analytical spectroscopy. 

In this paper we study the potential of MILCA, assisted by proper data pre- 
processing and postprocessing, in model-free blind spectral curve resolution. 
Unlike specialized chemometrics techniques, MILCA does not make use of 
nonnegativity, nor does it rely on band selection in any form. One could of 
course include violations of nonnegativity in some way as a contribution to 
the cost function (which would otherwise be just the mutual information) to 
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be minimized. However, we will not do this here, partly because it is not a 
priori clear how to weigh these two contributions. Instead, the present study 
elaborates on how violations of nonnegativity correlate with the quality of 
decomposition, and we will make direct comparisons with other chemometrics 
algorithms. 

The paper has the following structure. Section 2 introduces our approach 
to computing mutual information. It gives a description of the MILCA algo- 
rithm along with the preprocessing, optional corrections for residual negativity 
(postprocessing) and measures of performance used. In Section 3 the data sets 
analyzed in this work are briefly described. Section 4 concentrates on main 
results and, finally, conclusions are in Section 5. 



2 Methods 



2.1 Mutual information 



For a multivariate continuous random variable (Xi, X2, Xm) with given 
marginal and joint densities lJ.i{xi) and iJ,{xi,X2, ...,Xm), the mutual informa- 
tion is given by [32,55,56,57] 

M 

I{Xi,X2, Xm) = XI H{Xi) — H{Xi, X2, Xm), (1) 

i=l 



where 



H{Xi) = - j fXiln/^idxi (2) 



and 



H{Xi,X2, Xm) = — y /iln// dxidx2...dxM 



(3) 



are the differential entropies [32]. 

The MI is a measure of statistical dependence of the M variables, which 
means that it is zero if they are strictly independent (i.e., their joint den- 
sity factorizes, ^ = YlifJ-i), and it is positive otherwise. The advantages of 
MI is that it is sensitive to all types of dependencies (while, e.g., Pearson's 
coefficient is sensitive only to linear correlations) and it has a well defined 
information theoretic meaning. Also, due to the grouping property [55], it can 
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be decomposed for any partitioning of the set {Xi, X2, Xm} into depen- 
dencies within groups of Xi and dependencies between these groups, e.g., 
I{Xi,X2,Xs) = I{Xi,X2) + I{{Xi,X2),Xs). This leads directly to a con- 
ceptually extremely simple method of mutual information based hierarchical 
clustering (MIC [57]). 

When the variables Xi represent experimental measurements, information 
about them is usually given by a finite number of samples (realizations) 
k — 1,2, ...,A^, and MI has to be estimated by statistical inference. 
In our present case, the raw data will be an M x AT matrix X of M spectra 
Xj sampled at N wavelengths (frequencies) u'^ each 



1 2 



X 



N 



X 



N 

Xm j 



(4) 



Note that we view each spectrum as a random variable Xi (i = 1,2 ... , M) 
and the N spectral values xf, k = 1, 2, . . . , as its realizations. This should 
be contrasted to the alternative point of view where each spectral value at 
a given frequency i/'^ defines a random variable X^, k — 1,2, . . . , N, and 
Xi, i — 1,2, . . . , M are its realizations. Our task consists in computing (and 
subsequently minimizing) the MI estimator /(X) of the spectral data given by 
Eq. (4). Obviously, the realizations = (xJ, Xg, . . . , x^) are not independent 
of each other, but in the following we shall neglect this and use estimators 
developed for independent identically distributed realizations. 

The estimators given in [55] (which are actually closely related to the estimator 
for differential Shannon entropies used in [58]) are based on A;-nearest neighbor 
statistics. They have been shown to give rather precise estimates of Ml in 
any dimension M. In particular, they seem to be numerically free of bias for 
independent signals (when MI is zero). But even when this is not the case, 
their bias and variance seem to be smaller than those of other estimators [55] . 



2.2 Least dependent component approach 



As in the basic multivariate curve resolution setting, linear ICA starts out 
with a mixture model in the form 

X = AS, (5) 
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where X is a Af x matrix of mixed signals, S is a K x N matrix of unknown 
pure components (si, S2, s^), and A is an Mx mixing matrix (concentra- 
tions). The problem is to reconstruct blindly S and A from the observations 
X, assuming that the original sources are as "independent" as possible. More 
precisely, the demixing transformation W (which is an estimate for A~^ with 
the superscript —1 denoting matrix inverse or pscudoinverse) is sought such 
that it minimizes the mutual information estimator /(Y) of the estimated 
components Y = WX. The ICA decomposition is defined up to scaling (in- 
tensity ambiguity [59]) and permutation of sources [45]. In the simplest version 
of the MILCA algorithm which we apply here, the MI between components yj 
is computed at equal frequencies i^*^ (possible dependencies between sources 
at different frequencies can be assessed, e.g., using "delay" vectors; for details 
on this technique and also for a noninstantaneous mixing ansatz see [56]). 

The first step in ICA usually comprises principal component analysis (PCA), 
also called (prc)whitcning [37,38,60] which minimizes linear correlations in 
the data. The number of sources {K < M) is estimated in PCA by taking 
only decorrelated components with highest eigenvalues. Then the demixing 
transformation splits into a, K x M prewhitening matrix V and a square 
rotation matrix R: 

W = RV. (6) 



Thus, the ICA problem reduces to searching the minimum of /(Y) under pure 
rotation Y = RZ of prcwhitcned vectors Z = VX. An important simplifica- 
tion comes from the fact that rotation matrix can be further decomposed 
into R = H^jHij, where each two-dimensional rotation R^j acts on (zj,Zj) 
and is obtained so that it minimizes pairwise mutual information /(yj,yj). 
Convergence to the minimum of /(Y) is achieved once all Rjj have been opti- 
mized iteratively (for implementation details we refer to [56]). Drawing a line 
to curve resolution methods, R resolves rotational ambiguity [59] by bringing 
the estimates as close to the initial independence assumption as possible. 

The use of proper preprocessing of data can significantly improve performance 
of curve resolution algorithms. Here we study how MILCA performs on second 
derivative (SD) spectral data which are known to be better suited for spectral 
analysis of complex mixtures [18,31,61,62] than original (raw) spectra. We will 
explain why such preprocessing improves ICA decomposition performance. 
Specifically, one proceeds from original mixture vectors X to their second 
derivatives X" with respect to frequency (wavelength), approximated either 
by finite differences 

~ x(z/'=-^) - 2x(j/^) + x{u^+^) , (7) 
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or by means of smoothing polynomial Savitzky-Golay differentiation [63]. 
Then, PCA and MI minimization can be performed on X", yielding W" and 
Y" (here and below double primes indicate quantities estimated using SD 
data, whereas superscript (0) will indicate estimates obtained in the original 
space). Due to linearity of Eqs. (5) and (7), W" represents an estimate for 
demixing matrix A"-*^. The estimates for pure spectra (in the original space) 
can be recovered by applying the demixing transformation W" on the original 
measured mixture signals 

Y = W"X. (8) 



The MILCA estimates for spectra Y and concentrations A — (W")~^ can be 
further refined iteratively through an alternating least squares (ALS, [11,64]) 
procedure. In this method the nonncgativity constraint on spectra and concen- 
trations is imposed in a postprocessing step (i.e., after mixture decomposition 
has been done by some curve resolution algorithm) . Specifically, one can write 
the ALS iterations as follows: 

(1) set j=0; initialize Yq = Y, Aq = A; 

(2) set negative entries of A^ to zero; 

(3) update spectra Y^+i = (AjAj)~^AjX using the new (nonnegative) A^; 

(4) set negative entries of Y^+i to zero; 

(5) update concentrations A^+i = XYj^]^(Yj+iYj^^)~^ using the new (non- 
negative) Yj_|_i; 

(6) j = J + 1; continue at (2) until convergence ir reached; 

We denote the result of ALS iterations by Y*^") and W*^"\ If initial Yq ^-nd 
Aq are already close the optimal solution, then the ALS procedure should 
give small corrections by eliminating the negativity of ICA estimates. Notice 
that the mutual information /(Yj) will in general slightly increase during 
the ALS postprocessing, although, as we will show below, the decomposition 
performance is in most cases considerably improved due to better fulfilled 
nonnegativity constraint. 



2.3 Measures of performance 



Bearing in mind the scaling and permutation ambiguities, a good quahty mea- 
sure for ICA results is the Amari error index [39,56,58], which quantifies how 
well the demixing transformation W agrees with the true mixing matrix A (if 
such is known) 

P(W,A)^-^V( ^^''j , + ^^''j , )-l, (9) 
2K /j^^ maxk \pik\ maxklPkjl 
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where pij — (WA)jj. The Amari index P vanishes if W deviates from 
only in scahng and permutation of elements, and it increases as the quality of 
decomposition becomes poor. 

Other measures of decomposition performance are applied to match recon- 
structed components with pure original sources. To compare our results with 
those of [15], we shall use the inner product of normalized pure and estimated 
spectral vectors 



In addition, we introduce a scaled overall measure of positivity of the K vectors 
Yi forming the matrix Y 



If all vectors are strictly positive, 7r(Y) = 1, while it is less than one otherwise. 



3 Spectral data 

Four exemplary data sets (A through D) taken from the literature were cho- 
sen to evaluate the performance of MILCA on typical spectral curve resolution 
problems. These included both artificial and experimental mixtures with var- 
ious number of components M, amount of data points and quality (e.g., 
noise level, presence of experimental background, line widths). Our choice of 
test problems was largely dictated by two requirements, which we consider 
essential: (i) data are publicly available, were previously used for the purposes 
of method evaluation and can be assessed later by alternative methods; (ii) the 
number of test problems is large enough to generate statistics of performance. 

3.1 Simulated 3-component mioctures (A) 

To compose a statistically representative test set of randomized mixtures we 
first collected a pool of 99 experimental infrared absorption spectra in the 
range 550—3830 cm~^ (822 data points each) selected from the NIST database 
[65]. This set was designed to contain organic compounds having common 
structural groups (halogen-, alkyl-, nitro-substituted benzene derivatives, phe- 
nols, alkanes; alcohols, thiols, amines, esters), so that their spectra have mul- 
tiple overlapping bands and are, thereby, mutually dependent. After that a 




(10) 



7r(Y) 




(11) 
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sample of 10000 random triples of three-component mixtures {M = K = 3) 
was constructed by randomly choosing normalized pure spectra from the pool 
and applying random mixing matrices A. The resulting sample represents 
then a set of blind source separation problems on strictly positive strongly 
dependent sources. 

3.2 Near-infrared data set (B) 

The second separation problem was chosen from the publicly available database 
[66] which was established in [67] to facilitate evaluation and comparison of 
chemometrical methods. The near-infrared (1100 — 2500 nm, 700 data points 
per spectrum) test sample first analyzed by Windig and Stephenson [62,68] 
consists of 140 experimental mixtures of five pure solvents (methylene chloride, 
2-butanol, methanol, dichloropropane, acetone). For this data set both the con- 
centrations in each mixture and the reference spectra of the pure components 
are available. The fractional concentrations of the four mixture components 
were chosen form the set {10%, 22.5%, 35%, 47.5%, 60%}. The fractional con- 
centration of the fifth component were set to make the concentrations add up 
to 100% [62]. 

3.3 6-component infrared mixtures (C) 

To compare MILCA directly with the BTEM algorithm we also analyze here 
the same data as those used in the original work of Widjaja et al. [15] taking 
14 randomized experimental 6-component mixtures of toluene, n-hexane, ace- 
tone, 3-phenylpropionaldehyde (aldehyde), 3,3-dimethylbut-l-ene (33DMB), 
and dichloromethane (DCM) [69]. To test performance, we used also the ref- 
erence (pure) spectra of these compounds given in [15] (all spectra are FT-IR 
in the range 950 — 3200 cm~^ with 5626 intensities). Notice that the authors of 
[15] measured and decomposed a set of 18 spectra (including the experimental 
background) . For more technical details see the footnote [69] . 

3.4 Raman spectra of brain samples (D) 

True bhnd source separation by MILCA was performed on the spectral data 
measured from human brain samples by Krafft et al. [70] . Taken from neuro- 
surgical resection material, the normal brain tissues of white and gray matter 
were subject to near-infrared Raman microspectroscopy, as were also the tu- 
mor specimens of glioma (astrocytoma WHO°?)) and meningioma {WH0°1) 
types. The selected range was 600 — 3500 cm~^ covered by 3282 wavelengths. 
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For each of the 4 samples, 20 to 42 measurements had been made resulting 
in 117 spectra. While variabihty across the sample is small, spectroscopically 
resolved differences between the four distinct specimens are noticeable. This 
allows to attempt blind mixture decomposition taking all 117 spectra together 
to extract common least dependent components in each sample and estimate 
their average concentrations. These results obtained blindly by MILCA were 
subsequently compared to the same Raman spectra of four pure substances 
(protein albumin from bovine serum, lipids from bovine brain extract, choles- 
terol from lanoline, and water) which were used in [70], assuming that they 
approximate the main constituents of the brain tissues. 



4 Results and discussion 

In order to illustrate potential pitfalls arising from dealing with highly depen- 
dent spectral signals, we first ran MILCA on a simple synthetic 2-component 
mixture of o-xylene and p-xylene, two compounds with very similar molecu- 
lar structures and highly overlapping spectra (the spectral data were taken 
from [65]). The distribution on Fig. la shows that prewhitening (PCA) of 
the initially strictly positive components already leads to decorrelated vectors 
that cannot be made positive by any further pure rotation. Typically, in cases 
like this, minimizing the MI results in one component being poorly resolved, 
in appreciable violations of positivity and deviations from the pure signal it- 
self (Figs. lb,d). Obviously, this is even more severe for higher dimensional 
mixtures. It is therefore likely that in a problem with originally nonnegative 
overlapping sources already the first step (prewhitening) may be counterpro- 
ductive . This may be be the reason why algorithms based on PCA followed 
by subsequent rotation of decorrelated vectors into the positive domain (e.g., 
[46]) have so far met with limited success only (see also [49]). Likely, ICA ap- 
proaches free of prewhitening might be more "compatible" with such strong 
constraints as nonnegativity. 

When performing ICA in derivative space, the nonnegativity of components 
reconstructed by means of Eq. (8) is much better preserved (see Figs. lc,e and 
figures shown below). The main reason for the improvements is that prepro- 
cessing with second derivatives removes slowly varying components from the 
spectra, and it is these slowly varying components which show most of the un- 
desired dependencies between pure spectra. Seen from this point of view, dis- 
carding slow components in any multi-resolution decomposition such as, e.g., a 
wavelet decomposition [71] would presumably have similar effect as taking sec- 
ond derivatives (notice that we speak here of wavelets in the frequency space, 
not in the time domain). This is opposed to selecting nonoverlapping bands 
which would correspond to discarding spectral regions. Indeed, one might ap- 
ply both techniques (band targeting and filtering out slow components) in 
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combination, but we shall not do this in the present paper. 

We next studied behavior of MILCA (with and without (pre/post)preprocessing) 
on a large sample of synthetic randomized 3-component mixtures (data set A) . 
To demonstrate the effect of SD preprocessing, Fig. 2a shows the mutual infor- 
mation of original triples of sources and those after taking second derivatives 
(Eq. (7)). There is a clear trend towards lowering MI by SD preprocessing. 
This means that taking derivatives filters out mostly the information which is 
common to all 3 spectra, and which is, therefore, detrimental to MILCA. Since 
SD acts as a high-pass filter, the effect can be traced to amplifying rapidly 
varying components. The observed gain in decomposition performance can be 
further understood by noting that the MI of estimated components is closer 
to the MI of pure components, if MILCA is performed in derivative space 
(Figs. 2b,c). Finally, Fig. 2d confirms that the match between MI of recon- 
structed signals and MI of original signals (both estimated in the original 
space) improves when SD preprocessing is involved (compare to Fig. 2c). 

A more direct measure of performance (or quality of decomposition) is given by 
the degree to which the estimated demixing transformation corresponds to the 
actual mixing matrix and how well the nonnegativity is preserved. For this we 
gathered the statistics of the Amari index P (Eq. (9)) and positivity measure 
TT (Eq. (11)) over the same test set A. As expected, the decomposition becomes 
more difficult as the sources become more dependent. But while this is very 
pronounced when MILCA is performed in the original space (Fig. 3a), it is 
hardly visible when second derivatives are used (Fig. 3b). In fact, MILCA with 
SD preprocessing was able to reconstruct successfully most of spectra from set 
A (Figs. 3d,f), in contrast to MILCA without SD preprocessing (Figs. 3c, e). 
Amari index values P < 0.1 indicate good decomposition quality, whereas 
P > 0.3 can be considered as unacceptable. Somewhat surprisingly, the strong 
(and expected) correlation between Amari index and nonnegativity seen when 
MILCA is done without filtering (Fig. 3e) is nearly completely eliminated with 
SD preprocessing (Fig. 3f). Again, this suggests that nonnegativity alone may 
not be the optimal target in a PCA-based spectral curve resolution. 

In the following applications to experimental mixtures we will only use MILCA 
with second derivative preprocessing. 

In order to see if postprocessing might further improve decomposition per- 
formance, we applied ALS with nonnegativity constraints (as described in 
Sec. 2.2, making 600 iterations for each mixture). Figure 4a shows that in 
the vast majority of cases the ALS corrections reduced the errors in mixture 
resolution and there was only a handful of mixtures (hardly seen in Fig. 4a), 
for which ALS iterations diverged far from the MILCA estimates and from 
the true solutions. The statistics of decomposition efficiency of the combina- 
tion second derivatives-MILCA-ALS is notably improved, with the peak in 
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the Amari index distribution shifted down to P ~ 0.05 (Fig. 4b) which is 
indicative of a rehably high performance. 

We now proceed to experimental mixture problems and comparisons with 
several chemometrics algorithms. 

The test problem B offered a large set of mixtures to be decomposed, while 
each spectrum was relatively scarce in the number of data points (wave- 
lengths). The five components have strong overlaps in the most informative 
range 2000—2500 nm (see Fig. 5), with only a few spectral features (also partly 
overlapping) at lower wavelengths to facilitate decomposition. Since the sig- 
nals appeared rather smooth without much measurement noise, we performed 
MILCA on second derivatives computed by finite differences (Eq. (7)) (we 
verified that Savitzky-Golay smoothing did not improve results in this case). 
The resolved components were found to match the reference spectra fairly 
well with only minor violations of nonnegativity. The largest mismatches are 
observed for methylene chloride and 2-butanol (Figs. 5a,c). 

To test how well the original concentrations were recovered, in the right panels 
of Fig. 5 we plot the estimated concentrations versus the original ones. Since 
the original concentrations were nearly quantized (see Sec. 3.2 and [62]), the 
vertical scatter of each point cloud indicates the inaccuracies of the source 
reconstruction. They are slightly larger than those obtained in [62] with SIM- 
PLISMA, but MILCA produced notably fewer false negative concentrations: 
they occurred only for methylene chloride and 2-butanol (Figs. 5b,d) which 
show also the worst spectral reconstruction (Figs. 5a,c). 

The next comparison was made with the BTEM algorithm [15,54]. For this 
we used the same data (set C) as in [15]. The large number of data points 
per spectrum and cleanness of the data even without background subtraction 
[69] allowed to work with second derivatives. In this case a Savitzky-Golay 
filter with 81 point window and 7th order polynomial gave best performance. 
Figure 6 depicts the resolved components plotted together with the reference 
(pure) spectra. Although the MILCA approach does not specifically focus on 
preserving certain spectral features, we find all the major bands reproduced 
reasonably well. Some distortions appear mostly in the overlap range 2800 — 
3200 cm~^ which was also a source of imperfections in the BTEM analysis 
[15]. In Table 1 we give a quantitative comparison between SIMPLISMA, 
BTEM, and MILCA, with the inner product i (Eq. (10)) used as performance 
measure. Compared to BTEM, MILCA without ALS demonstrates almost 
equally high performance while being more straightforward in not using band 
selection. However, on this data set the combination of SIMPLISMA and 
ALS [9,11] evidently outperformed both. On the other hand, as reported in 
[15], the IPCA [52] and OPA [72] algorithms were less efficient. We find that 
MILCA-I-ALS performs clearly better than BTEM and is close in efficiency to 
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SIMPLISMA+ALS, although much more statistics would be needed to rehably 
judge the relative capabilities of these algorithms. 

Finally, we proceed to a more realistic, true blind source separation problem 
in which no exact information on chemical composition is known. As non- 
invasive spectroscopic methods are more and more used in the analysis of 
biological materials and in vivo measurements [19,21,26,27,44,70,73,76], they 
offer an increasing variety of such "black" [2] mixture separation problems. 
Here we analyze the results of a Raman spectroscopy study of brain speci- 
mens [70] (set D) to see whether MILCA decomposition could be helpful in 
quantifying the abundances of major chemical species present in the brain. In 
[70] it was shown that a 4-component model can be used to explain a large 
part of the complex Raman spectra from brain samples. In their work Krafft 
et al. [70] assumed that the main spectral contributions are that of proteins, 
lipids, cholesterol and water, for which reference spectra were obtained. Then 
they determined concentrations of these components by making a linear fit 
to the experimental data. We attempted to do the same in a blind manner, 
applying MILCA to the original experimental set of spectra. The decompo- 
sition was complicated by the high level of measurement noise (Fig. 7a), so 
Savitzky-Golay smoothing derivatives (19 point window, polynomials of order 
7) were used to preprocess the data. The first four least dependent compo- 
nents resolved by MILCA are plotted in Figs. 7b-e (the fifth and higher order 
components contained predominantly noise). We found that each of them was 
indeed very similar to one of the spectra from the model set (dashed curves 
in Figs. 7b-e), which supports the result of [70] that the model set represents 
the main constituents of the tissues. In addition, MILCA gives also estimates 
for the mixing matrices, i.e., for the concentrations. Based on these, we found 
the following lipid-to-protein average concentration ratios: 6.5 (white matter), 
1.2 (gray matter), 0.5 (astrocytoma), and 0.4 (meningioma). This trend may 
be of diagnostic value and is consistent with the model fit parameters of [70] 
and with the studies by alternative methods [74,75,76]. 



5 Conclusions 

We have approached several spectral curve resolution problems by a new 
blind source separation algorithm based on accurate estimates of mutual in- 
formation (MILCA, available online at [77]). We showed that, with proper 
(pre/post)processing, decomposition into least dependent components is suf- 
ficient to achieve separation performance comparable to that of the state-of- 
the-art specialized chemometrics techniques. 

Least dependent component analysis is a general statistical method with a very 
wide range of potential applications (here we refer to the extensive ICA liter- 
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ature surveyed, e.g., in [38,39]). It was designed to perform completely blindly 
without using any a priori or empirical information such as the locations of 
pure variables or nonnegativity of original sources. An important advantage of 
MILCA over other ICA methods is the fact that it can use (in) dependencies 
down to very small scales. Thus it can make efficient use of high pass filtered 
signals where most of the dependencies due to overlapping bands have been 
reduced. In practice, this may be helpful, e.g., in cases when reliable localiza- 
tion of pure variables is complicated by severe overlaps or noise. Also, opposite 
to the methods that actually rely on nonnegativity, MILCA is apphcable to 
alternating sign signals, as is the case, e.g., in the EPR spectroscopy [22]. 

As our simulations on a large data set (10000 mixtures) have shown, the use 
of properly chosen preprocessing (second derivatives) leads to reduction of 
nuisance dependencies which would otherwise give unwanted contributions to 
mutual information. The latter is the only cost function in our method. In this 
sense, filtering out slowly varying contributions is consistent with the goal of 
finding least dependent components in the data performed by minimizing their 
mutual information. 

Statistics of performance indicates also that imposing nonnegativity constraint 
during a postprocessing stage (e.g., in the form of alternating least squares) 
can further improve decomposition. We anticipate that the combination of 
MILCA and ALS will prove competitive in the category of PCA-based curve 
resolution algorithms. 

The conceptual simplicity of the MILCA approach is expected to be advan- 
tageous in applied spectroscopy, and this contrasts it with novel but rather 
sophisticated curve resolution algorithms [15,54]. On the other hand, MILCA 
allows for several generalizations discussed already in [56] . These include mix- 
ing with small spectral shifts and testing the independence of spectra at dif- 
ferent frequencies. In addition, it should be in principle feasible to include the 
nonnegativity constraint directly into the cost function. 

However, more promising are algorithms based on stochastic (Monte Carlo 
type) minimization of mutual information subject to the nonnegativity con- 
straint. The principal advantage of such methods is that if constrained mini- 
mization is performed under affine transformations (such as, e.g., combinations 
of rotations and shears [48,49]), then the prewhitening (PCA) and violations 
of nonnegativity it induces can be eliminated altogether. Our preliminary re- 
sults (to be reported in a separate publication) indicate that the decompo- 
sition performance of these algorithms may be superior to their PCA-based 
counterparts. 
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Fig. 1. PC A and MILCA applied to the mixture of o- xylene and p-xylene. Scatter 
plots of (a) PCA components z\ and (^ = 1)2,..., 822) in the original space (zi 

and Z2), (b) ICA components {y^i^''^ and reconstructed in the original space, 

(c) recovered components (yf and 1/3 ) with ICA done in SD space (see Eq. (8)). 
Panel (d) shows yf^ (dashed curve) and the true pure spectra Si and S2 (solid 

curves). The component y2''^ is indistinguishable from S2. Panel (e) is the same for 
yi and y2 obtained by MILCA in derivative space. In this case, the estimates are 
almost indistinguishable from the true sources. Also shown are the values of the 
inner product i (Eq. (10)) and the Amari index P (Eq. (9)). 
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Fig. 2. Preprocessing and MILCA decomposition of 3-component mixtures (statis- 
tics over 10000 cases, data set A): (a) MI of the second derivatives of the sources 
S" plotted against Ml of the original sources S; (b) Ml of estimated SD spectra Y" 
against MI of the pure SD spectra S"; (c) similar to panel (b), but without using 
second derivatives at all (MILCA is performed in the original space, producing esti- 
mates Y(o)); (d) again similar to panel (c), but with MILCA done using SD signals, 
in which case the estimates in the original space Y are obtained through Eq. (8). 
Deviations from the straight lines indicate differences between quantities plotted on 
the vertical and horizontal axes. 



21 




Fig. 3. MILCA performance statistics (10000 mixtures, data set A). Left (right) 
panels show results for MILCA in the original (second derivative) space: (a,b) MI of 
sources plotted against the Amari indices P (Eq. (9)) of the decompositions; (c,d) 
distributions of Amari indices over the test set; (e,f) dependence of the positivity 
measure tt (Eq. (11)) on the Amari index P. 
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Fig. 4. Improvement in decomposition performance (a) and the resulting distribution 
(b) over the dataset A achieved by applying the alternating least squares with 
nonnegativity constraint to the estimates from MILCA done in the second derivative 
space (to compare with Fig. 3d, same scale). 
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Fig. 5. Left column: Near-infrared spectra of estimated (solid curves) and original 
pure (dashed curves) components (data set B). The components were methylene 
chloride (a), 2-butanol (c), methanol (e), dichloropropane (g), and acetone (i). Right 
column: Estimated versus actual concentrations of these components. The straight 
lines have unit slopes and pass through the origin. 
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Fig. 6. Data set C: reconstructed by MILCA (solid) and reference [15] (dashed) 
spectra of toluene (a), n-hexane (b), acetone (c), aldehyde (d), 33DMB (e), DCM 

(f)- 
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Fig. 7. Blind determination of composition of human brain tissue and tumors (data 
set D). Exemplary near-infrared Raman spectra (a) of normal white and gray mat- 
ter, astrocytoma and meningioma tumors (from bottom to up). The four (b-e) es- 
timated components (solid curves) plotted to compare with the model set (dashed 
curves) [70] consisting of protein (b), lipids (c), cholesterol (d) and water (e). 
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SIMPLISMA 


SIMPLISMA+ALS 


BTEM 


MILCA 


MILCA + ALS 


toluene 


0.971 


0.973 


0.954 


0.987 


0.994 


n-hexane 


0.994 


0.995 


0.992 


0.990 


0.991 


acetone 


0.866 


0.899 


0.886 


0.933 


0.943 


aldehyde 


0.943 


0.953 


0.899 


0.901 


0.902 


33DMB 


0.576 


0.963 


0.983 


0.964 


0.948 


DCM 


0.969 


0.967 


0.904 


0.909 


0.966 



Table 1 



Data set C: performance of MILCA in comparison to the other curve resolution 
algorithms as measured by the values of inner product, Eq. (10), of reconstructed 
and reference spectra shown on Fig. 6. Numerical data for SIMPLISMA and BTEM 
were taken from Table 3 of [15]. 
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